Self-supervised Learning of Motion Capture

نویسندگان

Hsiao-Yu Tung

Hsiao-Wei Tung

Ersin Yumer

Katerina Fragkiadaki

چکیده

Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Accuracy of Inertial Measurement Units using Support Vector Regression

Inertial measurement unit (IMU) is a sensor that measures acceleration and angular velocity rate. It has become increasingly popular due to its small size and low cost comparing to typical marker-based motion capture system. Nonetheless, IMUs face considerable challenges, in particular noticeable inaccuracy from accumulated integration errors. In this project, we attempted to improve accuracy o...

متن کامل

Structure-Aware and Temporally Coherent 3D Human Pose Estimation

Deep learning methods for 3D human pose estimation from RGB images require a huge amount of domain-specific labeled data for good in-the-wild performance. However, obtaining annotated 3D pose data requires a complex motion capture setup which is generally limited to controlled settings. We propose a semi-supervised learning method using a structure-aware loss function which is able to utilize a...

متن کامل

Semi-supervised Learning with Encoder-Decoder Recurrent Neural Networks: Experiments with Motion Capture Sequences

Recent work on sequence to sequence translation using Recurrent Neural Networks (RNNs) based on Long Short Term Memory (LSTM) architectures has shown great potential for learning useful representations of sequential data. A one-to-many encoder-decoder(s) scheme allows for a single encoder to provide representations serving multiple purposes. In our case, we present an LSTM encoder network able ...

متن کامل

Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Kinect skeleton tracker is able to achieve considerable human body tracking performance in convenient and a low-cost manner. However, The tracker often captures unnatural human poses such as discontinuous and vibrated motions when self-occlusions occur. A majority of approaches tackle this problem by using multiple Kinect sensors in a workspace. Combination of the measurements from different se...

متن کامل

Prediction of Ground Reaction Forces and Moments via Supervised Learning Is Independent of Participant Sex, Height and Mass

Accurate multidimensional ground reaction forces and moments (GRF/Ms) can be predicted from marker-based motion capture using Partial Least Squares (PLS) supervised learning. In this study, the correlations between known and predicted GRF/Ms are compared depending on whether the PLS model is trained using the discrete inputs of sex, height and mass. All three variables were found to be accounte...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Self-supervised Learning of Motion Capture

نویسندگان

چکیده

منابع مشابه

Improving Accuracy of Inertial Measurement Units using Support Vector Regression

Structure-Aware and Temporally Coherent 3D Human Pose Estimation

Semi-supervised Learning with Encoder-Decoder Recurrent Neural Networks: Experiments with Motion Capture Sequences

Tracking Human-like Natural Motion Using Deep Recurrent Neural Networks

Prediction of Ground Reaction Forces and Moments via Supervised Learning Is Independent of Participant Sex, Height and Mass

عنوان ژورنال:

اشتراک گذاری